Distilbert Word2vec 256k MLM 250k
This model combines word2vec embeddings with the DistilBERT architecture, suitable for natural language processing tasks. The embedding layer is trained on large-scale corpora and remains frozen, while the model is fine-tuned via masked language modeling.
Large Language Model
Transformers